OLAP textual aggregation approach using the Google similarity distance
نویسندگان
چکیده
Data warehousing and On-Line Analytical Processing (OLAP) are essential elements to decision support. In the case of textual data, decision support requires new tools, mainly textual aggregation functions, for better and faster high level analysis and decision making. Such tools will provide textual measures to users who wish to analyse documents online. In this paper, we propose a new aggregation function for textual data in an OLAP context based on the K-means method. This approach will highlight aggregates semantically richer than those provided by classical OLAP operators. The distance used in K-means is replaced by the Google similarity distance which takes into account the semantic similarity of keywords for their aggregation. The performance of our approach is analyzed and compared to other methods such as Topkeywords, TOPIC, TuBE and BienCube. The experimental study shows that our approach achieves better performances in terms of recall, precision,F-measure complexity and runtime.
منابع مشابه
Top_Keyword: An Aggregation Function for Textual Document OLAP
For more than a decade, researches on OLAP and multidimensional databases have generated methodologies, tools and resource management systems for the analysis of numeric data. With the growing availability of digital documents, there is a need for incorporating text-rich documents within multidimensional databases as well as an adapted framework for their analysis. This paper presents a new agg...
متن کاملOlap aggregation function for textual data warehouse
For more than a decade, OLAP and multidimensional analysis have generated methodologies, tools and resource management systems for the analysis of numeric data. With the growing availability of semistructured data there is a need for incorporating text-rich document data in a data warehouse and providing adapted multidimensional analysis. This paper presents a new aggregation function for keywo...
متن کاملBinary-class and Multi-class based Textual Entailment System
The article presents the experiments carried out as part of the participation in Recognizing Inference in TExt (RITE-2) @NTCIR10 for Japanese. RITE-2 has four subtasks Binary-class (BC) subtask for Japanese and Chinese, Multi-class (MC) subtask for Japanese and Chinese, Entrance Exam for Japanese and RITE4QA for Chinese. We have submitted three runs in BC subtask for Japanese (JA) (one run), Ch...
متن کاملContent aggregation in natural language hypertext summarization of OLAP and Data Mining Discoveries
We present a new approach to paratactic content aggregation in the context of generating hypertext summaries of OLAP and data mining discoveries. Two key properties make this approach innovative and interesting: (1) it encapsulates aggregation inside the sentence planning component, and (2) it relies on a domain independent algorithm working on a data structure that abstracts from lexical and s...
متن کاملA new last aggregation compromise solution approach based on TOPSIS method with hesitant fuzzy setting to energy policy evaluation
Utilizing renewable energies is identified as one of significant issues for economical and social significance in future human life. Thus, choosing the best renewable energy among renewable energy candidates is more important. To address the issue, multi-criteria group decision making (MCGDM) methods with imprecise information could be employed to solve these problems. The aim of this paper is ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- IJBIDM
دوره 11 شماره
صفحات -
تاریخ انتشار 2016